Search CORE

1,571 research outputs found

Optimal quantum control of Bose Einstein condensates in magnetic microtraps

Author: D. P. Bertsekas
M. A. Nielsen
M. O. Scully
P. Zoller
R. Folman
Publication venue: 'American Physical Society (APS)'
Publication date: 15/01/2007
Field of study

Transport of Bose-Einstein condensates in magnetic microtraps, controllable by external parameters such as wire currents or radio-frequency fields, is studied within the framework of optimal control theory (OCT). We derive from the Gross-Pitaevskii equation the optimality system for the OCT fields that allow to efficiently channel the condensate between given initial and desired states. For a variety of magnetic confinement potentials we study transport and wavefunction splitting of the condensate, and demonstrate that OCT allows to drastically outperfrom more simple schemes for the time variation of the microtrap control parameters.Comment: 11 pages, 7 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

Learning-aided Stochastic Network Optimization with Imperfect State Prediction

Author: Ananthanarayanan G.
Bertsekas D. P.
Tadrous J.
Xu K.
Zhao S.
Publication venue
Publication date: 06/07/2018
Field of study

We investigate the problem of stochastic network optimization in the presence of imperfect state prediction and non-stationarity. Based on a novel distribution-accuracy curve prediction model, we develop the predictive learning-aided control (PLC) algorithm, which jointly utilizes historic and predicted network state information for decision making. PLC is an online algorithm that requires zero a-prior system statistical information, and consists of three key components, namely sequential distribution estimation and change detection, dual learning, and online queue-based control. Specifically, we show that PLC simultaneously achieves good long-term performance, short-term queue size reduction, accurate change detection, and fast algorithm convergence. In particular, for stationary networks, PLC achieves a near-optimal

[O(\epsilon)

O(\log(1/\epsilon)^2)]

utility-delay tradeoff. For non-stationary networks, \plc{} obtains an

[O(\epsilon), O(\log^2(1/\epsilon)

+ \min(\epsilon^{c/2-1}, e_w/\epsilon))]

utility-backlog tradeoff for distributions that last

\Theta(\frac{\max(\epsilon^{-c}, e_w^{-2})}{\epsilon^{1+a}})

time, where

e_w

is the prediction accuracy and

a=\Theta(1)>0

is a constant (the Backpressue algorithm \cite{neelynowbook} requires an

O(\epsilon^{-2})

length for the same utility performance with a larger backlog). Moreover, PLC detects distribution change

O(w)

slots faster with high probability (

w

is the prediction size) and achieves an

O(\min(\epsilon^{-1+c/2}, e_w/\epsilon)+\log^2(1/\epsilon))

convergence time. Our results demonstrate that state prediction (even imperfect) can help (i) achieve faster detection and convergence, and (ii) obtain better utility-delay tradeoffs

arXiv.org e-Print Archive

Crossref

Ancilla-assisted sequential approximation of nonlocal unitary operations

Author: D. P. Bertsekas
H.-P. Breuer
Hamed Saberi
I. Bengtsson
K. Kraus
M. A. Nielsen
Publication venue: 'American Physical Society (APS)'
Publication date: 16/09/2011
Field of study

We consider the recently proposed "no-go" theorem of Lamata et al [Phys. Rev. Lett. 101, 180506 (2008)] on the impossibility of sequential implementation of global unitary operations with the aid of an itinerant ancillary system and view the claim within the language of Kraus representation. By virtue of an extremely useful tool for analyzing entanglement properties of quantum operations, namely, operator-Schmidt decomposition, we provide alternative proof to the "no-go" theorem and also study the role of initial correlations between the qubits and ancilla in sequential preparation of unitary entanglers. Despite the negative response from the "no-go" theorem, we demonstrate explicitly how the matrix-product operator(MPO) formalism provides a flexible structure to develop protocols for sequential implementation of such entanglers with an optimal fidelity. The proposed numerical technique, that we call variational matrix-product operator (VMPO), offers a computationally efficient tool for characterizing the "globalness" and entangling capabilities of nonlocal unitary operations.Comment: Slightly improved version as published in Phys. Rev.

arXiv.org e-Print Archive

University of Regensburg Publication Server

Crossref

Approximate policy iteration: A survey and some new methods

Author: A. G. Barto
A. G. Barto
A. Gosavi
A. L. Samuel
A. L. Samuel
A. Nedić
B. Martinet
B. Roy Van
B. Roy Van
C. A. J. Fletcher
C. Szepesvari
C. Thiery
D. D. Castro
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. S. Choi
D. White
Dimitri P. Bertsekas
E. V. Denardo
F. L. Lewis
F. L. Lewis
F. Pineda
G. J. Gordon
G. J. Tesauro
G. Strang
H. Chang
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
I. Menache
I. Szita
J. A. Boyan
J. Liu
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
L. Busoniu
L. Busoniu
L. Busoniu
L. C. Baird
L. Gurvits
L. N. Trefethen
L. S. Shapley
M. A. Krasnoselskii
M. G. Lagoudakis
M. L. Puterman
M. Wang
N. Polydorides
P. J. Werbos
P. J. Werbös
P. T. Boer de
R. J. Williams
R. S. Sutton
R. S. Sutton
R. T. Rockafellar
R. Y. Rubinstein
S. J. Bradtke
S. Meyn
S. P. Singh
T. Jaakkola
T. Jung
V. F. Farias
V. S. Borkar
W. B. Powell
X. R. Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.National Science Foundation (U.S.) (No.ECCS-0801549)Los Alamos National Laboratory. Information Science and Technology InstituteUnited States. Air Force (No.FA9550-10-1-0412

CiteSeerX

DSpace@MIT

Crossref

Institute of Mathematics AS CR, v. v. i.

Generalizing movements with information-theoretic stochastic optimal control

Author: Azar M. G.
Bertsekas D.
Bertsekas D. P.
Dann C.
Ernst D.
Kupcsik A.
Lagoudakis M.
Neumann G.
Stengel R.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2014
Field of study

Stochastic optimal control is typically used to plan a movement for a specific situation. Although most stochastic optimal control methods fail to generalize this movement plan to a new situation without replanning, a stochastic optimal control method is presented that allows reuse of the obtained policy in a new situation, as the policy is more robust to slight deviations from the initial movement plan. To improve the robustness of the policy, we employ information-theoretic policy updates that explicitly operate on trajectory distributions instead of single trajectories. To ensure a stable and smooth policy update, the ”distance” is limited between the trajectory distributions of the old and the new control policies. The introduced bound offers a closed-form solution for the resulting policy and extends results from recent developments in stochastic optimal control. In contrast to many standard stochastic optimal control algorithms, the current approach can directly infer the system dynamics from data points, and hence can also be used for model-based reinforcement learning. This paper represents an extension of the paper by Lioutikov et al. (“Sample-Based Information-Theoretic Stochastic Optimal Control,” Proceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Piscataway, NJ, 2014, pp. 3896–3902). In addition to revisiting the content, an extensive theoretical comparison is presented of the approach with related work, additional aspects of the implementation are discussed, and further evaluations are introduced

University of Lincoln Institutional Repository

TUbiblio

Crossref

Publikationsserver der Universität Tübingen

MPG.PuRe

Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decision Theory

Author: A. N. Kolmogorov
D. P. Bertsekas
D. P. Bertsekas
G. J. Chaitin
J. Schmidhuber
J. Schmidhuber
L. A. Levin
L. A. Levin
L. P. Kaelbling
M. Feder
P. Gács
R. Bellman
R. J. Solomonoff
R. J. Solomonoff
R. Sutton
S. J. Russell
Publication venue
Publication date: 01/01/2000
Field of study

Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown distribution. We unify both theories and give strong arguments that the resulting universal AIXI model behaves optimal in any computable environment. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI^tl, which is still superior to any other time t and space l bounded agent. The computation time of AIXI^tl is of the order t x 2^l.Comment: 8 two-column pages, latex2e, 1 figure, submitted to ijca

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University

Reducing Electricity Demand Charge for Data Centers with Partial Execution

Author: Bash C.
Bertsekas D. P.
Boyd S.
Chen Y.
Gong Z.
Madhyastha H. V.
Zhou R.
Publication venue
Publication date: 17/12/2013
Field of study

Data centers consume a large amount of energy and incur substantial electricity cost. In this paper, we study the familiar problem of reducing data center energy cost with two new perspectives. First, we find, through an empirical study of contracts from electric utilities powering Google data centers, that demand charge per kW for the maximum power used is a major component of the total cost. Second, many services such as Web search tolerate partial execution of the requests because the response quality is a concave function of processing time. Data from Microsoft Bing search engine confirms this observation. We propose a simple idea of using partial execution to reduce the peak power demand and energy cost of data centers. We systematically study the problem of scheduling partial execution with stringent SLAs on response quality. For a single data center, we derive an optimal algorithm to solve the workload scheduling problem. In the case of multiple geo-distributed data centers, the demand of each data center is controlled by the request routing algorithm, which makes the problem much more involved. We decouple the two aspects, and develop a distributed optimization algorithm to solve the large-scale request routing problem. Trace-driven simulations show that partial execution reduces cost by

3\%--10.5\%

for one data center, and by

15.5\%

for geo-distributed data centers together with request routing.Comment: 12 page

arXiv.org e-Print Archive

Crossref

A Robust Optimization Approach to Inventory Theory

Author: Aurélie Thiele
Bertsekas D.
Bertsekas D.
Dimitris Bertsimas
Glasserman P.
Moon I.
Moon I.
Scarf H.
Zipkin P.
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Fault Tolerant Filtering and Fault Detection for Quantum Systems Driven By Fields in Single Photon States

Author: Belavkin V. P.
Belavkin V. P.
Bertsekas D. P.
Breuer H.-P.
Daoyi Dong
Elliott R.
Herschel Rabitz
Ian R. Petersen
Qing Gao
Publication venue: 'AIP Publishing'
Publication date: 17/01/2016
Field of study

The purpose of this paper is to solve a fault tolerant filtering and fault detection problem for a class of open quantum systems driven by a continuous-mode bosonic input field in single photon states when the systems are subject to stochastic faults. Optimal estimates of both the system observables and the fault process are simultaneously calculated and characterized by a set of coupled recursive quantum stochastic differential equations.Comment: arXiv admin note: text overlap with arXiv:1504.0678

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Linearly Parameterized Bandits

Author: Auer P.
Bertsekas D.
Bertsekas D.
Bertsimas D.
Ginebra J.
John N. Tsitsiklis
Paat Rusmevichientong
Pressman E. L.
Thompson W. R.
Publication venue
Publication date: 01/01/2008
Field of study

We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an

r

-dimensional random vector

\mathbf{Z} \in \mathbb{R}^r

, where

r \geq 2

. The objective is to minimize the cumulative regret and Bayes risk. When the set of arms corresponds to the unit sphere, we prove that the regret and Bayes risk is of order

\Theta(r \sqrt{T})

, by establishing a lower bound for an arbitrary policy, and showing that a matching upper bound is obtained through a policy that alternates between exploration and exploitation phases. The phase-based policy is also shown to be effective if the set of arms satisfies a strong convexity condition. For the case of a general set of arms, we describe a near-optimal policy whose regret and Bayes risk admit upper bounds of the form

O(r \sqrt{T} \log^{3/2} T)

.Comment: 40 pages; updated results and reference

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref